Search CORE

Webb Miller and Trey Ideker To Receive Top International Bioinformatics Awards for 2009 from the International Society for Computational Biology

Author: B. J. Morrison McKay
BP Kelley
Clare Sansom
E Lee
P Shannon
S Suthram
SF Altschul
SF Altschul
T Ideker
W Miller
W Miller
Publication venue: Public Library of Science
Publication date: 01/04/2009
Field of study

Digital Repository @ Iowa State University (ISU)

Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty

Author: A Agrawal
A Agrawal
AA Schäffer
AK Hartmann
Ankit Agrawal
AY Mitrophanov
CA Orengo
J Rocha
M Kschischo
M Pagni
ML Sierk
MS Waterman
P Bucher
PH Sellers
R Mott
R Mott
R Mott
R Olsen
RF Mott
S Grossmann
S Karlin
S Kotz
S Sheetlin
S Wolfsheimer
SE Brenner
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SF Altschul
TF Smith
WR Pearson
WR Pearson
WR Pearson
WR Pearson
WR Pearson
X Huang
X Huang
Xiaoqiu Huang
YK Yu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Results: Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty. Conclusion: The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search

Novel 3D protein structural homology search algorithm based on the Triangle ID

Author: Altschul SF et al
Ben-Hur A et al
ITO Nobutoshi
SATO SHOICHI
Publication venue: 'Division of Chemical Information and Computer Sciences'
Publication date: 01/01/2005
Field of study

PLAST: parallel local alignment search tool for database comparison

Author: A Jacob
D Lavenier
Dominique Lavenier
GM Amdahl
H Zhang
Hoa Van Nguyen
KM Chao
M Farrar
M Gertz
M Pop
M Roytberg
N Firasta
S Karlin
SF Altschul
SF Altschul
SF Altschul
T Rognes
TF Smith
V Sachdeva
W Hu
W Liu
WR Pearson
X Fei
YK Yu
YK Yu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. Results: A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set) and the multithreading concept (multicore). Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. Conclusions: A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

IMAD: flexible annotation of microarray sequences

Author: D Smedley
Dennis Prickett
DL Wheeler
Michael Watson
P Neerincx
SF Altschul
W Swinkels
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Edinburgh Research Explorer

Accelerating exhaustive pairwise metagenomic comparisons

Author: A Alyass
B Nichols
BD Ondov
CD Polychronopoulos
G Benoit
G Jing
H Li
JA Hanley
MLV Pitteway
O Gotoh
O Torreno
SF Altschul
Y Liu
Y Liu
Publication venue: Springer, Cham
Publication date: 01/01/2017
Field of study

In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. Parallelization strategies are applied to take advantage of modern multiprocessor architectures. In addition, sequential optimizations in CPU time and memory consumption are provided. These algorithmic and computational enhancements enable IMSAME to calculate near optimal alignments which are used to directly assess similarity between metagenomes without requiring reference databases. We show that the overall efficiency of the parallel implementation is superior to 80% while retaining scalability as the number of parallel cores used increases. Moreover, we also show thats equential optimizations yield up to 8x speedup for scenarios with larger data.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

FAAST: Flow-space Assisted Alignment Search Tool

Author: Bengt Persson
Björn Andersson
DJ Lipman
Fredrik Lysholm
J Jerlström-Hultqvist
M Droege
M Margulies
MO Dayhoff
O Gotoh
R Kofler
S Balzer
SB Needleman
SF Altschul
SF Altschul
TF Smith
V Vacic
WR Pearson
Z Ning
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background High throughput pyrosequencing (454 sequencing) is the major sequencing platform for producing long read high throughput data. While most other sequencing techniques produce reading errors mainly comparable with substitutions, pyrosequencing produce errors mainly comparable with gaps. These errors are less efficiently detected by most conventional alignment programs and may produce inaccurate alignments. Results We suggest a novel algorithm for calculating the optimal local alignment which utilises flowpeak information in order to improve alignment accuracy. Flowpeak information can be retained from a 454 sequencing run through interpretation of the binary SFF-file format. This novel algorithm has been implemented in a program named FAAST (Flow-space Assisted Alignment Search Tool). Conclusions We present and discuss the results of simulations that show that FAAST, through the use of the novel algorithm, can gain several percentage points of accuracy compared to Smith-Waterman-Gotoh alignments, depending on the 454 data quality. Furthermore, through an efficient multi-thread aware implementation, FAAST is able to perform these high quality alignments at high speed. The tool is available at <url>http://www.ifm.liu.se/bioinfo/</url></p

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

FAAST: Flow-space Assisted Alignment Search Tool

Author: Fredrik Lysholm
Björn Andersson
Bengt Persson
M Margulies
M Droege
SB Needleman
TF Smith
O Gotoh
DJ Lipman
WR Pearson
SF Altschul
SF Altschul
MO Dayhoff
V Vacic
R Kofler
S Balzer
J Jerlström-Hultqvist
Z Ning
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Publikationer från Linköpings universitet

Aston Publications Explorer

Digitala Vetenskapliga Arkivet - Academic Archive On-line

A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

Author: A Krogh
A Marchler-Bauer
A Milosavljević
A Pertsemlidis
AA Schäffer
AY Mitrophanov
BJ Webb
Burkhard Rost
C Barrett
C Webber
D Drasdo
D Metzler
D Siegmund
DJC MacKay
EJ Gumbel
EP Nawrocki
ET Jaynes
I Letunic
J Park
JD Storey
JF Lawless
JS Liu
K Karplus
K Karplus
K Sjölander
M Madera
MG Kann
MQ Zhang
MS Waterman
N Chia
P Bucher
R Bundschuh
R Durbin
R Mott
R Mott
R Mott
R Olsen
RC Edgar
RD Finn
S Johnson
S Karlin
S Karlin
S Miyazawa
Sean R. Eddy
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SR Eddy
SR Eddy
TF Smith
WR Pearson
Y-K Yu
Y-K Yu
Y-K Yu
Y-K Yu
Publication venue: Public Library of Science
Publication date: 01/05/2008
Field of study

Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments

Public Library of Science (PLOS)